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Abstract: Pitch frequency is the fundamental frequency of a 
speech signal It is one of the most important parameters for 
speech signal processing. The simulated results on Keele pitch 
reference database show that the performance of the proposed 
wavelet transform based pitch detection algorithm is obviously 
better than the original AMDF and its improvements based 
algorithms. 
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I. Introduction 

Pitch is one of the most important parameters for speechsignal 
processing including speech synthesis, automatic speech 
recognition, speech enhancement etc. Thus it is very important 
to extract the pitch from the speech accurately. Recently there 
are many pitch detection methods [1] [2] [3] [4] having been 
proposed. 

During each period of voiced speech the glottis is excited and a 
GCI (Glottal Closure Instant) occurs. This phenomenon 
corresponds to a zero crossing in the waveform. If a speech 
signal is filtered by a derivative function, a maximum will occur 
at each zero crossing in the waveform. Pitch period detection 
algorithms are generally divided in two categories; event 
detection and non-event detection. Event detection algorithms 
based on autocorrelation function use the relatively prominent 
peaks in autocorrelation. They have a short coming in 
estimating pitch period just for a certain vowel, therefor; their 
efficiency is reduced where speech signal is non- stationary. In 
non-event detection methods pitch period for a segment of 
speech signal is calculated by some methods such as cepstrom 
or average magnitude difference function (AMDF). However, a 
falling trend presents as a global feature [5] in AMDF so that 
some detection errors are often happened. It is that the estimated 
pitch is half or multiple of the actual. To avoid these errors, 
some improvements of the conventional AMDF were proposed 
in these literatures [5] [6]. These improvements are mainly made 
that modifying the definition of AMDF (such as CAMDF [5]) 
or adjusting the length of the frame which is used to compute 
AMDF (such as EAMDF [6]) to improve the performance of 
AMDF. Also a new modified AMDF based onEmpirical Mode 
Decomposition (EMD) [7] to estimate pitch is not very satisfied 
and will bring other unexpected errors. These methods 
determine pitch period by a direct approach therefore they are 
less computation intensive when they operate on windowed 
speech. Hence they are not suitable for a wide range of speech 
sources. 

During last few years wavelet transform has been used as a tool 
to analyze many kinds of problems. Kadambe showed when a 



GCI happens in speech signal, there would be coincident local 
maximums in its wavelet coefficients for consecutive scales [8]. 
Therefore pitch period estimation by means of wavelet 
transform is done by determining the GCI's and measuring the 
elapsed time between such two adjacent points. 
In this paper, we propose a new method based on wavelet 
transform to estimate pitch period and a high accuracy is 
ensured at the same time. 

The rest of paper is organized as following: Section 2 reviews 
AMDF, CAMDF, EAMDF and EMD AMDF. After that a pitch 
detection algorithm based on wavelet transform is proposed. 
Section 3 gives results of the compared experiments and 
discussions. Finally, the paper is concluded in Section 4. 

II. Material and Methodology 

A. Review of AMDF and Its Improvements 

The conventional AMDF was proposed by Ross et al. in 
1974 [2] and it is defined as follows: 

iV-T-l 



D(t)= ^ | x(n) - x(n + r) | 



(1) 



n=0 



Where x (ri) denotes a voiced speech frame multiplied by a 
rectangular window of length N, and r denotes the lag number. 
As shown in Fig. 1(b), instead of true pitch, we estimate a 
double pitch from AMDF. In this figure, speech is a female 
voiced frame (Fig. 1(a)) [9]. 
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Fig. 1: Comparison between (b) AMDF, (c) EAMDF, (d) CAMDF, and 
(e) EMD AMDF of (a) a female voiced speech frame [9] . 
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In order to 
Circular AMDF 
description of 
CAMDF is given by: 



overcome the falling trend of 
(CAMDF) was proposed in [5] 



N-l 



D c (t) = £ 



x(mod(n + t,N) — x(n)) \ 



AMDF, 
and the 



(2) 



71=0 



the modulo 



Where mod (n + r,A0 represents 
operation,meaning that (n + r) moduloAf. 

FromFig. 1(d), we can see that CAMDF eliminates the falling 
trend, but doublepitch error is still occurred. 
In [6], extended AMDF was proposed and high accuracy was 
reported. EAMDF is defined as following: 



N+- 



(3) 



EAMDF can conquer the falling trend of AMDF. 

Fig. 1(c) shows EAMDF of the same speech frame. We can see 

that double error cannot be conquered. 

Empirical mode decomposition AMDF was proposed in [9]. 
EMDAMDF is defined as following: 



SeMDAMDF (0 = ^ Cn (0 



(4) 



In contrast of the original AMDF, EMDAMDF eliminates 
thefalling trend efficiently and adaptively by using EMD. It can 
be seen in Fig. 1 that EMDAMDF (Fig. 1(e)) detect the pitch 
period. 

B. Wavelet Transform 

The wavelet transform (WT) could be classified as either 
continuouswavelet transform or discrete wavelet 
transform(DWT). Acontinuous wavelet transform of a 
signal* (t) G L 2 R resultsin: 

1 f + °° ft-T\ 

WT X (a), t) = x{t)cp* )dt co>0 (5) 

VOOJ-oo \ co / 

Where the function^? (t) is usually referred to as mother 
wavelet, cois the scaling factor, r is the shift and * stands for 
complex conjugation. The DWT can be performed via the multi 
resolution analysis wavelet decomposition/reconstruction 
algorithm developed by Mallat. At the m th level, the multi 
resolution space, V m , is spanned by the basic 

functions and the space, W m , 

r ™ 

orthogonal to V m in V m _i is spanned by|2 2 0(2 m t — n); n G 

Z, where <p(tjis called the scaling function and 0(tjis called the 
wavelet function. Mallat's algorithm allows wavelet coefficients 
(also called the detailed version of the signal), d m>n = 
(x(t),0 mn ) and scaling coefficients (also called the 
approximation version)x m n = (x(t), (p m>n )at the m th scale 
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to be calculated recursively from the representation of the 
signal, x(t) at the preceding, finer scale,x m _ l n through the 
following filtering operation: 



= ^ a 0 (/c - 2n)x n 

k 

d m ,n = ^ d\{k — 2n)x m _ 1 j 



(6) 



(7) 



Wherea 0 (n) = ((p 1>0 ,(Po >n ) , <0i, o ,0o,n>- 

C. Proposed Pitch Detection Algorithm Based on 
DWT 

First, the segmentation is done by windowing the original 
signal with a length equal to an approximate duration of a 
phoneme (i.e. 26.5ms), and jumping of 10ms from each 
window to the next is employed. Then the wavelet 
transform of each segment is calculated in 2, 3, 4 and 5 
consecutive scales. 

After carrying out the above procedure, the local 
maximums of wavelet coefficients that have a value greater 
than 70% of the global maximum of the segment are 
chosen. Among these local maximums of wavelet 
coefficients, if the distance between the locations of each 
consecutive local maximums of the segment is less than 
the lowest pitch period in speech signal (i.e. 3ms), the 
location of the local maximum with higher amplitude is 
chosen and another one eliminates. 

If the locations of these extracted local maximums are the 
same for at least two consecutive scales, then the segment 
is considered to be of voiced type, and thepitch period is 
obtained by measuring the distance between these local 
maximums. In situation where the locations of these local 
maximums do not coincide, the segment is considered to 
be unvoiced, and the pitch period is then taken as zero. 
In the present work,Haar wavelet is employed to estimate 
the pitch period. 

III. Results and Tables 

We use the Keelepitch extraction reference database [10] which 
is obtained from ftp://ftp.cs.keele.ac.uk/pub/pitch/ to test the 
performance of the proposed algorithm. Both female (F1-F2- 
F3) and male (M2-M3-M4) speakers' speech are used here. The 
speech data is sampled at 20 kHz with 16-bit resolution. The 
reference pitch values are provided at 100Hz frame rate with 
26.5ms rectangular window. Some reference pitch which are 
recorded as '-1 ' from the database are manually cut down. 
Fig. 2 shows the eligible local maximums of wavelet 
coefficients of a female voiced speech frame, after carrying out 
the procedure of proposed method. 
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Fig. 2: The eligible local maximums of wavelet coefficients. 

We evaluate AMDF, CAMDF, EAMDF, EMDAMDF and the 
proposed wavelet transform based pitch detection algorithms on 
Keelepitch database. According to the definition of Rabiner 
[11], if the detected pitch period for a frame defers 1ms from the 
reference value, the error is defined as a gross pitch error 
(GPE). The errors are reported in terms of percentage GPE 
denoted as %GPE. 

Table 1. Comparisonof Different Algorithms In Terms Of %GPE on Female 





Fi 


F 2 


F 3 


AMDF 


22.66 


11.93 


13.11 


CAMDF 


9.34 


5.73 


7.75 


EAMDF 


7.51 


4.58 


5.03 


EMDAMDF 


6.07 


3.84 


4.63 


Proposed 
Method 


2.64 


2.90 


1.67 



Table2. Comparison of Different Algorithms In Terms Of %GPE on Male 





M 2 


M 3 


M 4 


AMDF 


9.92 


21.31 


19.51 


CAMDF 


7.32 


22.04 


17.87 


EAMDF 


3.08 


11.33 


9.42 


EMDAMDF 


2.81 


9.10 


8.35 


Proposed 
Method 


5.52 


5.52 


7.39 



As shown in Table 1 and Table2, the %GPE of different 
algorithms for female and male speech are obtained 
respectively. From these two tables, we can see that the 
proposed wavelet transform based pitch detection algorithm 
performs better than all the other functions based algorithms for 
either female or male, except M 2 . 

It is also observed that compared with the original AMDF and 
its improvements, the superiority of the proposed Method can 
easily be seen on female speech. 



IV.Conclusion 

In this paper, we give a pitch detection algorithm based on 
wavelet transform. Finally, a simulated pitch detection 
experiment based on the Keeledatabase is conducted. The 
results show that the performance of the proposed method 
based on wavelet transform outperforms the AMDF based 
improvements such as CAMDF, EAMDF and EMDAMDF in 
comparison. 
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